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Abstract —In this paper, we consider the dynamic power con¬ 
trol for delay-aware D2D communications. The stochastic opti¬ 
mization problem is formulated as an infinite horizon average cost 
Markov decision process. To deal with the curse of dimensionality, 
we utilize the interference filtering property of the CSMA-like 
MAC protocol and derive a closed-form approximate priority 
function and the associated error bound using perturbation 
analysis. Based on the closed-form approximate priority function, 
we propose a low-complexity power control algorithm solving the 
per-stage optimization problem. The proposed solution is further 
shown to be asymptotically optimal for a sufficiently large carrier 
sensing distance. Finally, the proposed power control scheme is 
compared with various baselines through simulations, and it is 
shown that significant performance gain can be achieved. 


I. Introduction 

Future wireless cellular networks (e.g. IMT-advanced) are 
expected to provide higher data rates and system capacity. One 
potential technology to meet the demands is the infrastructure- 
assisted device-to-device (D2D) communications Q. Tak- 
ing advantage of the physical proximity of communication 
devices, the D2D technique enables direct communications 
between devices, which results in high data rates, low delays 
and low power consumption. Unlike conventional ad hoc 
networks, the cellular base station (BS) plays an important 
role for D2D communications in helping the D2D nodes on 
both peer discovery and resource allocation 0. There are 
several existing works on D2D communications in cellular 
networks. In a and a, the D2D nodes share the spectrum 
with cellular users using an underlay approach, in which the 
throughput of D2D communications is maximized while the 
QoS of the cellular users is guaranteed. In 0 and 0, the 
maximum sum-rate of the network is achieved by dynamically 
selecting one of the transmission modes, including D2D mode 
with shared channels, D2D mode with dedicated channels and 
cellular transmission mode. In the multi-antenna cellular 
BS acts as a cooperative relay, helping the D2D nodes forward 
packets so as to improve the throughput of the network. Power 
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control is important for interference coordination among the 
nodes in wireless networks. The transmit power is adjusted 
to meet the users’ required signal to interference plus noise 
ratios (SINR) satisfy the received signal power level II 
or achieve a higher data rate HOll . In ifTTl . the transmit power 
is minimized for D2D communications subject to a sum-rate 
constraint. However, these existing works have all focused 
on the physical layer performance without consideration of 
the bursty data arrivals at the transmitters as well as the 
delay requirement of the information flows. Since real-life 
applications (such as video streaming, web browsing or VoIP) 
are delay-sensitive, it is important to optimize the delay 
performance for D2D communications. 

To take the queueing delay into consideration, the radio re¬ 
source control policy should be a function of both the channel 
state information (CSI) and the queue state information (QSI). 
This is because the CSI reveals the instantaneous transmission 
opportunities at the physical layer and the QSI reveals the 
urgency of the data flows. However, the associated optimiza¬ 
tion problem is very challenging. A systematic approach to 
the delay-aware optimization problem is through the Markov 
Decision Process (MDP). In general, the optimal control policy 
can be obtained by solving the well-known Bellman equa¬ 
tion. Conventional solutions to the Bellman equation, such as 
brute-force value iteration or policy iteration lfl2ll . have huge 
complexity (i.e., the curse of dimensionality), because solving 
the Bellman equation involves solving an exponentially large 
system of non-linear equations. There are some existing works 
that use the stochastic approximation approach with distributed 
online learning algorithm HTIl . which has linear complexity. 
However, the stochastic learning approach can only give a 
numerical solution to the Bellman equation and may suffer 
from slow convergence and lack of insight. We treat this issue 
and provide some preliminary results on cross-layer design 
with closed-form solution in m. 

In this paper, we investigate the dynamic power control for 
D2D communications systems. We focus on minimizing the 
average transmit power and the average delay of the D2D data 
flows. There are several technical challenges associated with 
the dynamic power control optimization problem. 

• Challenges due to the Average Delay Consideration: 
Unlike other papers which optimize the physical layer 
throughput of the D2D systems, the optimization involv- 
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Fig. 1. Topology of an infrastructure-assisted D2D communications system. 

ing delay constraints is fundamentally challenging. This 
is because the associated problem belongs to the class 
of stochastic optimization [ 15], which embraces both in¬ 
formation theory (to model the physical layer dynamics) 
and queueing theory (to model the queue dynamics). A 
key obstacle to solving the associated Bellman equation 
is to obtain the priority function, and there is no easy and 
systematic solution in general \l\Ti\ . 

• Challenges due to the Coupled Queue Dynamics: The 
interference among the D2D nodes fib) . (T7\ fundamen¬ 
tally induces coupled queue dynamics among the D2D 
flows. For instance, the service rate of the queue for 
each D2D flow depends on the transmit power of all the 
other active D2D flows due to the mutual interference. 
The associated stochastic optimization problem is a K- 
dimensional MDP, where K is the number of D2D 
flows. This AT-dimensional MDP leads to the curse of 
dimensionality with complexity exponential to K for 
solving the associated Bellman equation. It is highly 
nontrivial to obtain a low complexity solution for the 
dynamic resource control of the D2D systems. 

• Challenges due to the Non-Convexity Nature: Despite 
the complexity issue involved in obtaining the priority 
function for the stochastic optimization problem, the per- 
stage control optimization in the Bellman equation is 
also non-convex due to the mutual interference term in 
the mutual information. This poses a great challenge in 
solving the delay-constrained optimization in the D2D 
systems. 

In this paper, we first establish the PHY, MAC and bursty 
data source models as well as the queue dynamics in Section 
II. We formally formulate the associated stochastic optimiza¬ 
tion problem of the dynamic power control for delay-aware 
D2D communications as an infinite horizon average cost 
MDP. To overcome the aforementioned technical challenges, 
we exploit specific problem structures in D2D communi¬ 
cations. Specifically, 1) the CSMA-like MAC protocol is 
adopted to coordinate the transmissions of the D2D nodes 
in a distributive way and this induces a weak interference 
topology among the simultaneously transmitting D2D nodes, 
and 2) the assistance of the BS substantially simplifies the 
signaling mechanism of control information exchange. We 
derive a simplified optimality condition for solving the MDP in 
Section III. Compared with the conventional Bellman equation 


EE), the derived optimality condition involves solving a K- 
dimensional partial differential equation (PDE) only. Utilizing 
the interference filtering property of the MAC protocol, we 
obtain a closed-form approximate priority function and the 
associated error bound using perturbation analysis. Based on 
that, we obtain a delay-aware low complexity dynamic power 
control algorithm for the D2D communications in Section 
IV. The solution is shown to be asymptotically optimal for a 
sufficiently large carrier sensing distance in the MAC protocol. 
Furthermore, in Section V, we show that the proposed solution 
achieves significant performance gain over various baseline 
schemes. 

II. System Model 

In this section, we introduce the system model for the 
infrastructure-assisted D2D communications, including the 
D2D system topology, the physical layer model, the MAC 
layer model and the bursty data source model. We first list the 
important notations in this paper in Table 1. 


TABLE I 

List of important notations 


Symbol 

Meaning 

K 

number of D2D pairs 

P = { p k} 

transmit power 

H = {H kj } 

global CSI 

L = {L kj } 

large-scale path gain 

& = Wk} 

MAC output 

v = Nit} 

probability of accessing the channel 

A = {A k } 

bit/packet arrival 

A= (A fc } 

average arrival rate 

Q = {Qk} 

global QSI 

X = {<t,H,Q} 

global system state 

n(x) = (n fc (x)} 

power control policy 

T 

duration of a time slot 

C fc (H,P) 

achievable data rate of the k -th D2D pair 

5 

carrier sensing distance 

L 5 

worst-case cross-channel path gain 

V*(Q) 

priority function 


A. D2D System Topology 

We consider an infrastructure-assisted D2D communications 
system, as shown in Fig. [T] Specifically, the D2D system 
consists of two tiers, namely the cellular tier and the D2D tier. 
In the D2D tier, there are K transmitter-receiver (Tx-Rx) pairs 
located randomly in the area of a cell. Transmitter k transmits 
data to receiver k , and the Tx-Rx pair is associated by the D2D 
peer discovery procedure G). All D2D pairs share a common 
channel, which is orthogonal to the channels used in the 
cellular tieiQ. Hence, there is no cross-tier interference between 
the cellular and D2D tiers. In the cellular tier, the BS plays the 
role of the centralized controller for the D2D communications. 
Each D2D pair communicates directly on a single-hop link 
in a distributed ad-hoc manner with the assistance of the 
cellular BS. The time is slotted, and the duration of each time 
slot is r. The cellular BS collects necessary information and 
broadcasts the resource allocation actions (calculated based on 

2 The channel for D2D communications could be a dedicated part of the 
licensed spectrum allocated by the BS, or another spectrum band, e.g., Wifi 
D2D transmission on the ISM band. 
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the collected information) periodically to the D2D nodes at the 
beginning of each time slot. 


B. Physical Layer Model 

Let Sk denote the information symbol for the k -th D2D pair. 
The received signal at receiver k is 

Vk — hkkVPkSk T~ ^ ^ hkj sjPjSj T~ (1) 

desired signal j' v noise 

interference 

where hkj is the complex channel fading coefficient between 
transmitter j and receiver k, and Zk ~ CA/"(0, Nq) is the i.i.d. 
complex Gaussian channel noise with power No. Pk is the 
transmitter power for Let H(t) = {Hkj(t) : Vj, k} be 
the global CSI, where Hkj(t ) = \hkj(t)\ 2 3 is the instantaneous 
channel path gain from transmitter j to receiver k at the t -th 
time slot. We consider the CSI according to the block fading 
channel model d, 03 and have the following assumption 
on H: 

Assumption 1 (Short-Term CSI Model): The CSI H (t) re¬ 
mains constant within a time slot and is i.i.d. over time slots. 

(t) follows a negative exponential distributior@ with mean 
Lkj. Furthermore, Hkj(t ) is independent w.r.t. the D2D pair 
indices k,j. ■ 

Note that Lkj is the large-scale path gain from transmitter 
j to receiver k. Let L = {Lkj : V), k}, and we have the 
following assumption on L. 

Assumption 2 (Long-Term Path Gain Model): The long¬ 
term large-scale path gain L is constant for the duration of 
the communication session. Specifically, for any transmitter j 
and receiver fc, the relationship between the path gain Lkj and 

the distance dkj i|j Lkj = where G r k and 

G f j are the receive and transmit antenna gains respectively, 
and A is the carrier wavelength. ■ 

Let P (t) = {Pk{t) • Vfc} be the collection of the transmit 
power of all the D2D transmitters at the t -th time slot. For 
given CSI H(£) and power actions P (t), the achievable data 
rate of the k -th Tx-Rx pair depends on the SINR by treating 
interference as noise, which is calculated as 

1 H k k(t)Pk(t ) \ 

riVo 

( 2 ) 

where T is the SINR gap ED to measure the practical 
reduction of the SINR with respect to the capacity. T depends 
on the error probability requirement as well as the modulation 
scheme. 

2 Rayleigh fading is adopted as an example here for algebraic simplicity. The 
proposed optimization framework is general to cover various channel fading 
models as well. With other fading models, the difference is in integrating 
with different fading distributions when calculating the expectation over H 
to estimate the expected future cost. 

3 Here we adopt the Friis free space path loss model (20). Note that the 
results of this paper can be extended easily for other path loss models. 


C. MAC Layer Model 

The D2D nodes utilize a CSMA-like protocol to arbitrate 
the random channel access in a distributed manner. The basic 
principle of the CSMA is listen-hefore-talk [22], which is 
used to avoid collision between simultaneous transmissions of 
neighboring nodes. As a result, the MAC protocol determines 
the subset of the D2D nodes in which the transmitters can 
transmit data simultaneously without causing excessive inter¬ 
ference. For simplicity, we consider the following idealized 
MAC protocol model, which has been widely adopted in 
justifying the hardcore point process 123]. 

Assumption 3 (Hardcore Point Process Model): The D2D 
nodes adopt a CSMA-like MAC protocol with the carrier sens¬ 
ing distancS. The output of the MAC protocol is captured 
by the MAC output process cr(t) = (or(t),--- ^ctk(L)) G 
{0, 1} K , where ak(t ) = 1 means that the k- th D2D node 
accesses the channel at the t -th time slot. The MAC output 
process cr(£) has the following properties: 

• (Tk(t) is i.i.d. over time slots according to the Bernoulli 
distribution with mean E[cr/ C (t)] = Vk (5). 

• all transmit nodes have equal opportunity to access the 
channel, i.e., Vk {3) = |^ fc ^| +1 , where A4 (J) is the set 
of transmit nodes within the carrier sensing distance 5 
from transmitter k and |A4 (S)\ is the associated cardi¬ 
nality. 

• during each time slot t , a feasible cr(f) satisfies the 

following carrier sensing constraint: if ak(t) = 1, then 
c7j(t ) = 0 for all j G A4- ■ 

The first condition corresponds to the memoryless property 
of the MAC protocol with respect to the channel access. 
The second condition corresponds to the fairness among the 
D2D nodes in the neighbour set, and the third condition 
corresponds to the carrier sensing requirement in the MAC 
protocol. Note that zcorresponds to the spatial reuse 
factor for transmitter k in the D2D network for a given carrier 
sensing distance S. Furthermore, Vk (<S) and A4 (S) depend on 
the topology of the D2D nodes. 

D. Bursty Data Source and Queue Dynamics 

There is a bursty data source at each D2D transmitter. 
Let A (t) = (Ai(t)r, ••• ,ix(t)r) be the random arrivals 
(number of bits) from the application layers to the K D2D 
transmitters at the end of the t -th time slo§ We have the 
following assumption on A (t). 

Assumption 4 (Bursty Source Model): Assume that Ak (t) 
is i.i.d. over decision slots according to a general distribution 
Pr [Ak\. The moment generating function of Ak exists with 
E [Ak\ = Afc. Ak (t) is independent w.r.t. k. Furthermore, the 
arrival rates (Ai,..., A k) lie within the stability region l24l 
of the system. ■ 

Each D2D transmitter has a data queue for the bursty traffic 
flows towards the associated receiver. Let Qk(t) G [0, oc) 

4 Carrier sensing distance refers to the carrier sensing range of the associated 
CSMA protocol. Two nodes within the carrier sensing distance will not 
transmit simultaneously. 

5 We assume that the transmitters are causal so that the packets arrived at 
the time slot are not observed when the control actions of this time slot are 
performed. 


C' fc (H(t),P(t))=log 2 1 + 
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be the queue length (number of bits) at transmitter k at the 
beginning of the t-th slot. Let Q (t) = (Qi(t), • • • , Qk( t)) G 
Q = [0,oo) K be the global QSI. The queue dynamics of 
transmitter k is 

Qk{t+ 1) = max{Q k (t) - a k (t)Ck{H(t),P(t))T,0}+A k (t)T 

(3) 

Remark 1 (Weak Coupling Property of Queue Dynamics): 
The K queue dynamics in the D2D system are coupled 
together due to the interference term in ©. Specifically, 
the departure of the queue at each transmitter depends 
on the power actions of all the K D2D transmitters. 
Furthermore, the CSMA-like mechanism in the MAC 
protocol model in Assumption [3] contributes to filtering the 
strong interference between the active D2D transmitters. 
Let L 6 = max{L/ej : Vfc 7 ^ j, dkj > be the worst-case 
cross-channel path gain for a given sensing threshold S. Due 
to the interference filtering property of the MAC protocol, 
there is only weak queue coupling in the D2D network, and 
L s measures the coupling intensity. We will leverage this 
weak coupling property to derive low complexity closed-form 
approximate solutions in Section IV. ■ 


where the queue transition probability is given by 

Pr[Q0 + l)|W),ft(W))] 

T Pr [A k (t) ], if Qk (t + 1) is given by ©, Vfc 
k (5) 

0 , otherwise 

For technical reasons, we consider the admissible control 
policy defined below. 

Definition 2 (Admissible Control Policy): A policy Cl is 
admissible if the following requirements are satisfied: 

• Cl is a unichain policy, i.e., the controlled Markov chain 
{X (0} under Cl has a single recurrent class (and possibly 
some transient states) urn 

• The queueing system under Cl is third-order stable in the 

sense that lim^oo ^ n [Y^k=i < °°> where E° 

means taking expectation w.r.t. the probability measure 
induced by the control policy Cl. ■ 

B. Problem Formulation 

As a result, under an admissible control policy Cl , the 
average delay cost for the k- th D2D pair is given by 


III. Delay-Aware Cross-Layer Control 
Framework 

In this section, we formally formulate the delay-aware cross¬ 
layer radio resource control framework for D2D communica¬ 
tions. We first define the control policy and the optimization 
objective. We then formulate the design as a Markov Deci¬ 
sion Process (MDP) and derive the optimality conditions for 
solving the problem. 

A. Power Control Policy 

For delay-sensitive applications, it is important to dynam¬ 
ically adapt the transmit power of the D2D nodes based on 
the instantaneous realizations of the CSI (captures the instan¬ 
taneous transmission opportunities) and the QSI (captures the 
urgency of the K data flows). Let x = (cr, H, Q) denote the 
global system state. We define the stationary power control 
policy below. 

Definition 1 (Stationary Power Control Policy): A station¬ 
ary control policy for the k -th D2D transmitter is a 
mapping from the system state x 1 ° the power control ac¬ 
tion of transmitter k. Specifically, Clk(x) = ft > 0. Let 
Cl = {Clk :Vk} denote the aggregation of the control policies 
for all the K D2D transmitters. ■ 

Since the D2D nodes access the channel randomly, the 
MAC output <j is i.i.d. over time slots. The CSI H is i.i.d. 
over time slots based on the block fading channel model in 
As sumption [T] Furthermore, from the queue evolution equation 
in (0, Q(£ + l) depends only on Q(£) and the data rate. Given 
a control policy Cl, the data rate at the t- th time slot depends 
on (Tk(t ), H (t) and Cl(x(t)). Hence, the global system state 
x(t) is a controlled Markov chain lfl2l with the transition 
probability 

Pr[W + l)IW),^(W))] (4) 

= Pr[er(i + 1 )] Pr[H(f + 1 )] Pr[Q(f + l)|%(f), 


1 T_1 

D k (fl) = limsup - Vfi 

T_5> °° £=0 


n 


Qk (t) 


A k 


V/c 


( 6 ) 


Similarly, under an admissible control policy Cl, the average 
power cost of the k- th D2D transmitter is given by 


1 T_1 

Pfe(^) = limsup -Ve" [Pfc(f)], Vfc (7) 
T_ *' 00 1 t =o 

We formulate the dynamic power control problem for the 
delay-aware D2D system as follows: 

Problem 1 (Power Control for Delay-Aware D2D Systems): 
The power control problem for the delay-aware D2D 
communications is formulated as 


min L(Cl) ( 8 ) 

o 

K 

=y2fkD k (n) +7 k pm) 

^ 1 average average 

delay power 

1 T_1 

= limsup - [c(Q(f),12(x (t)))] 

T_5> °° £=0 

where c(Q,P) = Y, k =i (@k + 7kPk)- P = {Pk > 0 : 

Vk} and 7 = {yk > 0 : Vk} are positive weights for the delay 
cost and the power cost respectively. ■ 

Problem [T] embraces various optimization formulations such 
as minimizing the average delay subject to the average power 
constraint or minimizing the average transmit power subject 
to the average delay constraint. This is because these “con¬ 
strained optimization problems” have the same Lagrangian 
function, which is given by ([ 8 ]) in Problem [l] The weights 
/3 and 7 are equivalent to the Lagrangian multipliers of the 
associated constraints. Also note that Problem [I] is an infinite 
horizon average cost MDP, which is known as a very difficult 
problem. 
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C. Optimality Conditions for Power Control Problem 

Problem |T] is an MDP, and the associated Bellman equa¬ 
tion lfl2l involves the entire system state x = (cr,H, Q). 
Exploiting the i.i.d. properties of H (t) and cr(t ), we obtain 
the following equivalent Bellman equation. 

Theorem 1 (Sufficient Conditions for Optimality): For any 
given weights (3 and 7 , assume there exists a (#*, {V*(Q)}) 
that solves the following equivalent Bellman equation : 


K 

£ 

/c=l 


dJ( Q-,L S ) 
dQk 


(Afe — cTfeCfe (H,P)) 


Q 


= 0 

(ID 


with boundary condition J (0; i/) = 0. 


. J (Q; i* 5 ) = 0 (IIQH 3 ) 

Then, we have 


are increasing functions of all Qk- 


0*r -f V*(Q) 

=E 


mm 
fi (x) L 


(Q, n ( X )) r + £ Pr [Q' | X) O (%)] V * (Q') 
Q 7 


VQ G Q (9) 

‘ Q 


Furthermore, for all admissible control policy Cl, V* satisfies 
the following transversality condition : 

lim [V* (Q (T))] = 0 (10) 

T —Yoo 1 


Then 0* = minL(fi) is the optimal average cost, and V* (Q) 

is the priority function of the K data flows. If CL* (%) attains 
the minimum of the R.H.S. of © for all Q G Q, then CL* is 
the optimal control policy for Problem [l] ■ 

Proof: Please refer to Appendix A. ■ 

Remark 2 (Interpretation of Theorem\]}: At each stage 
when the queue length is Q (£), the optimal action has to 
strike a balance between the current cost and the future cost 
because the action taken will affect the future evolution of 
Q(t + 1). Furthermore, based on the unichain property of the 
admission control policy, the solution obtained from Theorem 
[His unique JT2). ■ 


IV. Fow-Complexity Power Control Solution 

One key obstacle in deriving the optimal power control 
policy CL* is to obtain the priority function for the Bellman 
equation in ©. Conventional brute force value iteration or 
policy iteration algorithms can only give numerical solutions 
and have exponential complexity in K, which is highly 
undesirable. In this section, we shall exploit the interference 
filtering property of the MAC protocol and adopt perturbation 
theory to obtain a closed-form approximation of the priority 
function V*(Q) and derive the associated error bound. Based 
on that, we obtain a low complexity dynamic power control 
algorithm for the delay-aware D2D communications. 


A. Closed-Form Approximate Priority Function via Perturba¬ 
tion Analysis 

We adopt a calculus approach to obtain a closed-form 
approximate priority function. We first have the following 
theorem for solving the Bellman equation in ©. 

Theorem 2 (Calculus Approach for Solving ©j: Assume 
there exist c°° and J (Q; L 5 ) of class C 2 (M+ ) that satisfy 

• the following partial differential equation (PDE): 


E 


mm 

fi (x) 


r K 

£ 

L /c=l 


hf^+lkPk ) -C° 


VQ g 




0* = c°° + o(l), V* (Q) = J (Q; L s ) + o(l), VQ e Q (12) 


where the error term o(l) asymptotically goes to zero for 
sufficiently small r. ■ 

Proof: please refer to Appendix B. ■ 

Theorem [2] suggests that if we can solve for the PDE in (fill) , 
then the solution (J(Q;L 5 ),c°°) is only o( 1) away from 
the solution of the Bellman equation (V*(Q), 6*). Before we 
solve the AT-dimensional PDE in (flTT) . we first recognize that 
due to the interference filtering property of the MAC protocol 
in Assumption [3] the cross-channel path gain of all the active 
D2D flows are quite weak and the worst-case interfering path 
gain is L 6 . Note that the solution of (HUi depends on the 
worst-case cross-channel path gain L 6 and, hence, the K- 
dimensional PDE in (fTTI) can be regarded as a perturbation 
of a base system defined below. 

Definition 3 (Base System): A base system is characterized 
by the PDE in (fTTI) with L 6 = 0 . ■ 

We then study the base system and use J(Q; 0) to obtain a 
closed-form approximation of J(Q; L 6 ). We have the follow¬ 
ing lemma summarizing the priority function J(Q; 0 ) of the 
base system. 

Lemma 1 (Decomposable Structure of J( Q;0)j: The solu¬ 
tion J(Q; 0) for the base system has the following decompos¬ 
able structure: 

K 

J (Q; o) = Jk (Qk) (13) 

k=l 


where Jk (Qk) is the per-fiow priority function for the k- th 
data flow given by 


Qk{y) =f 


a-k 


Pk V(|M;(5)| + l)ln2 \y 


Ei ( ^ I - X k y 


y 


Jk(y ) 


(|A4(£) | + 1) In 2 

1 


-£i ( — 

y 


C k 


fa V4(|A4(£)| + l)ln2 


El (t) (V - “*) 


y(y - a k ) 


4( |.A4 (<5) | + 1) In 2 


e y — 




bk 


where 


CLk 


N 0 r^ k In 2 

Tfcfc 


(14) 


(|V>W|+i)i°2 ( dke ~ akEl (dt))’ where dk satisfies 

(f) = = 

f°° bk is chosen to satisfyQ the boundary condition 

•4(0) = 0. ■ 


6 To find firstly solve Qk{y = 0 using one-dimensional search 
techniques (e.g., bisection method). Then b & is chosen such that Jk{yf) — 0- 
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Proof: please refer to Appendix C. ■ 

Note that when L 5 = 0, the interference network has 
Lkj =0 for all k j with dkj > S and, hence, there is no 
interference between the active D2D ndoes. As a result, the K 
D2D flows are totally decoupled and the system is equivalent 
to a decoupled system with K independent D2D flows. That 
is why the priority function J (Q; 0) in the base system has 
the decomposable structure in Lemma U 

We then analyze the asymptotic property of the per-flow 
priority function Jk(Qk ) in Corollary [I] 

Corollary 1 (Asymptotic Property of Jk ( Qk))•' 

M |A4(£)| + i) 


Jk ( Qk ) — 


Ql 


2 A k log 2 (Qk) 


Ql 

log 2 (Qk), 

as Qk —oo (15) 


Proof: Please refer to Appendix D. ■ 

Next, we study the PDE in (fill) for large S. Note that large 
S corresponds to small cross-channel path gains within the set 
of active D2D nodes. Hence, J(Q; L 6 ) can be considered as a 
perturbation of the solution of the base system J(Q; 0). Using 
perturbation analysis, we establish the following theorem on 
the approximation of J(Q; L 6 ): 

Theorem 3 (First Order Approximation of J (Q; L 6 )): 
J(Q-,L S ) can be approximated by J (Q; 0), and the first 
order perturbation term is given by 


J{Q;L S ) =J( Q;0) 


K 




k =1 o^k 

3^ k (5) 
~\2 


( DkjLkjQtQj 
H lo g2Q/c) 2 log 2 Qj 


+ 0 


( DkjLkjQtQj \ \ 
\ (log 2 Qk ) 2 log 2 Qj) ) 


O 


(16) 


where Di ■ - - 

where U kj - 2 (i n 2 )A fc A J - 7 i Ar 0 * 

Proof: Please refer to Appendix E. ■ 

The priority function V(Q) is decomposed into the fol¬ 
lowing three terms: 1 ) the base term J 2 k Jk(Qk) obtained by 
solving a base system without coupling, 2 ) the perturbation 
term accounting for the first order interference coupling due 
to simultaneously transmitting D2D nodes after MAC filtering, 
and 3) the residual error term. As a result, we adopt the 
following closed-form approximation of V(Q): 


K K 

V(Q ) = E j *(^) + E E 


k =1 


k= 1 

3<tM k (5) 


Dkj LkjQ kQ j 

(log 2 Qk ) 2 log 2 Qj 


(17) 


Remark 3 (Approximation Error w.r.t. System Parameters): 

• Approximation Error w.r.t. Traffic Loading: the ap¬ 
proximation error is a decreasing function of the average 
arrival rate A 

• Approximation Error w.r.t. SNR: the approximation 
error is an increasing function of the SNR (which is a 
decreasing function of 7 ^). 

• Approximation Error w.r.t. Sensing Distance: the ap¬ 
proximation error is a decreasing function of the carrier 
sensing distance at the ordei 0 at least O (p-). 


7 For any k, j f k and j £ A4(<5), we have d^j > S. Therefore, 
according to the long term path gain model in Assumption [2] we have 
r _ _„( l\ 


From Corollary \l\ and (fT71) . the priority function V(Q) = 

q2 

0( i 0 g Q k ) for large Vfc. As a result, the longer queue will 
get higher priority in the order of i 0 ®Q k - Based on Theorem |T] 
and Theorem [3] the approximation error between the optimal 
priority function V* (Q) in Theorem |T| and the closed-form 
approximate priority function V (Q) in ([17]) is O(^) + o( 1 ). 
In other words, the error terms are asymptotically small w.r.t. 
the carrier sensing distance S and the slot duration. 


B. Asymptotically Delay-Optimal Power Control Algorithm 

In this section, we use the closed-form approximate priority 
function in ([171) to capture the urgency information of the 
K D2D pairs and obtain low complexity delay-aware power 
control. Using the approximate priority function in (IT71) and 
Lemma [2 the per-stage control problem (for each state real¬ 
ization x) is given b)o 

max ]T ( ^222 q fc fi(H,P) - 7fc P fc ) (18) 

^ 1 ^ v* y data rate 

flow weight 


where 

dQk 


can be calculated from (IT71) which is given by 


9V (Q) 
dQk 


= Jk (Qk) 


y- Qjif&Qk — 1) 

(In 2) log 2 Qk log 2 Qj 

j<£xr k (s) 


( ZDkjLkjQk 

V lo g2 Qk 


(19) 

DjkLjkQj A 

iog 2 Qj ) 


The per-stage problem in (fl8l) is similar to the weighted 
sum-rate (WSR) optimization subject to the power constraint, 
which has been widely studied in li25l and lf26l . However, 
unlike conventional WSR problems where the weights are 
static, the weights here in (IT8l) are dynamic and are determined 
by the QSI via the priority function • As such, the role 

of the QSI is to dynamically adjust the weight (priority) of the 
individual flows, whereas the role of the CSI is to adjust the 
priority of the flow based on the transmission opportunity in 
the rate function Cfc(H,P). Note that the per-stage problem 
in (fl 8 l) is challenging due to the non-convexity of Ck (H, P) 
w.r.t. P. We shall first derive a low complexity iterative 
solution that converges to the stationary point of (fl 8 l) . We then 
show that the converged solution is asymptotically optimal for 
sufficiently small L 6 . 

Algorithm 1 (Delay-Aware Dynamic Power Control): 

• Step 1 [Initialization]: Let n = 0. Initialize a feasible 

P(0). 

• Step 2 [Iteration]: In the (n+l)-th iteration, the transmit 
power of each D2D transmitter is updated based on the 
power results of the n-th iteration according to 


P k (n + 1) 


( dV(Q) 1 _ r J fc (n) \ + 

( dQk (In2)(7 k+Ck(n)) H kk ) 

( 20 ) 


8 Note that J' (Q k ) = (/ dQ d 7' > ) \y- y ( Qk ) = where 

y (Qk) satisfies Q k ( y (Q k )) = Qk- 
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where I k (n) = N 0 + ’Ej^k,j^ k (S) H kj p j( n ) and 

/* j (m} 1 dV(Q) _ Hjj Pj (n)H kj _ 

- 2^j^k,j$Ar k (5) In 2 dQj IjfaHljW+HjjPjW 

• Step 3 [Termination]: Set n = n +1 and go to Step 2 
until a certain termination condition is satisfied. ■ 

Although the problem in (f]~8l) is non-convex in general, we 
show below that Algorithm |T] converges to the global optimal 
solution asymptotically for sufficiently large 8. 

Corollary 2 (Asymptotic Optimality of Algorithm^: 
Algorithm |T] converges to the unique global optimal point of 
the problem in (fl8l) for sufficiently large 8. ■ 

Proof: Please refer to Appendix F. ■ 

C. Summary of the Overall Solution and Implementation Con¬ 
siderations 

We give a summary of the overall dynamic power con¬ 
trol solution and discuss some implementation considerations 
(computational complexity) in the context of LTE-Advanced 
systems d). Specifically, we consider the scenario of fully 
controlled D2D communications |28] in LTE-Advanced in 
which the eNodeB takes control of the radio resource for 
the D2D nodes inside its coverage. A frame is divided into 
a contention phase , a reporting phase , a decision phase and a 
data transmission phase , which are described as follows: 

1) Contention Phase: D2D nodes access the channels 
distributively according to a CSMA-like MAC protocol. 
At the end of the contention phase, each D2D transmitter 
gets its corresponding MAC output cr/ c (t). Also, during 
this phase, the CSI H(£) could be estimated by the D2D 
receiver^. 

2) Reporting Phase: Each of the active transmitters 
(A(t) = {k : cr k (t) = 1}) report their local CSI 
{Hkj(t) : Vj} and local QSI Q k (t) to the eNodeB via 
Physical Uplink Control Channel (PUCCH) and Physical 
Uplink Shared Channel (PUSCH) (29|, respectively. 

3) Decision Phase: After receiving the CSI and QSI re¬ 
ports, the eNodeB calculates the optimal power for the 
active D2D nodes according to the proposed Algorithm 
[U and broadcasts the power control actions to the active 
D2D nodes via Physical Downlink Control Channel 
(PDCCH) |29l . 

4) Data Transfer Phase: The active D2D transmitters 
adjust their transmit power according to the power 
control broadcasted from the eNodeB and transmit data 
during the data transmission phase in the current frame. 

Remark 4 (Computational Complexity Consideration): 

The computational complexity of the proposed solution 
is very low. Specifically, most of complexity comes from 
computing the priority function in (IT71) and computing the 
power control actions using Algorithm |T] The complexity 
of computing the priority function is very low (due to the 
closed form characterization) compared with conventional 
brute-force value iterations algorithms lfl2ll . which have 
exponential complexity in K. Computing the power control 

9 Each active D2D transmitter has to send the control signaling for MAC 
contention. The CSI can be estimated if the signaling is sent with a given 
power, i.e., as the reference signal. 


actions using Algorithm \T\ is similar to those conventional 
iterative water-filling solutions for solving WSR optimization 
in |25|| . We shall quantify the complexity comparison in 
Section V. ■ 

Remark 5 (Extension for OFDMA and General Fading): 
The solution framework in Theorem 2 and Theorem 3 can be 
extended easily to multi-channel systems (such as OFDMA 
EQl ) as well as general fading distributions. For OFDMA 
systems, the modification required is the rate equation in ©. 
Each channel can be treated independently since orthogonal 
parallel channels do not introduce additional coupling. For 
general fading distributions, the modification required is the 
solution of the per-flow PDE in the base system J k {Qk ) in 
Lemma U ■ 

V. Simulation Results 

In this section, we evaluate the performance of the proposed 
low-complexity power control scheme for D2D communica¬ 
tions. The following four baseline schemes are adopted for 
performance comparison. 

• Baseline 1 [Cellular Mode]: The Tx-Rx pairs transmit 
their data via the cellular BS in a conventional way 0 . 
The K pairs share the channel using TDMA in a Round- 
Robin way. 

• Baseline 2 [D2D with Fixed Power]: The transmitters 
always transmit with the maximum power for D2D com¬ 
munications 0 . 

• Baseline 3 [D2D with CSI-based Power Control]: 

Large deviation Oil is an approach to bypass the com¬ 
plex delay minimization by converting the delay con¬ 
straint into an equivalent rate constraint. The CSI-based 
power control scheme determines the transmit power for 
maximizing the total data rate without considering the 
queueing information 02 l . 

• Baseline 4 [D2D with Queue-weighted Power Control]: 

Lyapunov drift approach li24ll considers queue stabiliza¬ 
tion instead of delay minimization. The queue-weighted 
power control scheme exploits both CSI and QSI, and 
solves the per-stage problem (fl 8 l) replacing with 

Qk • It is similar to the Modified Largest Weighted Delay 
First algorithm in lf33l but with a modified objective 
function. 

In the simulations, 10 D2D pairs are considered in a single 
cell with radius 500m. The transmitters are located randomly 
in the cell and the receivers appear within the D2D communi¬ 
cation range of their corresponding transmitters, which is set 
to 50m. The carrier sensing distance S is 100m. Poisson data 
arrival is considered with a uniform distributed average arrival 
rate, which has mean 5Mbps. The path gain is calculated as 
Lkj = 15.3 + 37.6 log iq dkj OH with the fading coefficient 
distributed as CJ\f(Q, 1). The average transmit power is 23dBm 
and the noise power spectrum density is -174dBm/Hz. The 
system bandwidth is 10MHz. The duration of the time slot is 
lms. The SINR gap T is set to 1 in the simulation. The weights 
7 k are the same and (3k = 1 for all k. For comparison, the 
delay performances of different schemes are evaluated with the 
same average transmit power by adjusting 7 k . For obtaining 
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Fig. 2. Performance comparison with different average arrival rates 



Fig. 3. Performance comparison with different average transmit power 

the average performance, we consider 100 random topologies, 
each of which has 1000 time slots. 

Fig. H shows the average delay versus the average arrival 
rate. For large traffic load, the transmission via D2D com¬ 
munication has significant performance gain compared with 
the conventional cellular transmission. This is mainly because 
of the short distance between D2D transmitters and receivers 
and their efficient spatial reuse. It can also be observed that 
the proposed power control algorithm outperforms all the 
baselines, which verifies the accuracy of the priority function 
approximation in the proposed power control scheme. It is 
noticed that the delay of the proposed scheme with small 
arrival rate is not 0 but a small value, because the transmitters 
could not transmit data in all time slots. 

Fig. [3] shows the average delay versus the average transmit 
power. The proposed power control scheme also achieves 
better performance than other baseline schemes. A larger trans¬ 
mit power could increase the received power of the desired 
signal, but, meanwhile, would cause more serious interference 
to other D2D pairs. Because of the two-fold effect of the 
transmit power, the change of the average delay performance is 
relatively small with adjustment of the average transmit power. 

Hg. a indicates the average delay versus the D2D commu¬ 
nication range. Unlike the average transmit power, the D2D 
communication range affects the received power of the desired 
signal without increasing the interference directly, so the 
average delay changes a lot with different D2D communication 
ranges. It can be found that the proposed power control scheme 
outperforms the baselines when the D2D communication range 


Fig. 4. Performance comparison with different D2D communication ranges 



Fig. 5. Effect of carrier sensing distance 

is small. For large D2D communication ranges (i.e., 100m 
and 125m), all schemes achieve quite poor delay performance. 
Note that since the carrier sensing distance S is set to 100m 
here, MAC could not filter the large interference well. Thus, 
the performance of the proposed power scheme degrades 
because the weak coupling property of the queue dynamics 
does not hold when the D2D communication range is too large 
compared to the carrier sensing distance. 

Fig. \5\ shows the effect of carrier sensing distance S of the 
proposed power control scheme. As discussed before, a very 
small sensing distance cannot filter the large interference or 
guarantee the weak coupling property of the queue dynamics. 
However, a very large sensing distance leads to inefficient 
spatial reuse. An appropriate carrier sensing distance should 
be selected to balance the tradeoff between the above two 
aspects. From Fig. \5\ we observe that the proposed scheme 
could achieve good delay performance with a large regime of 
carrier sensing distance. 

Table M illustrates the comparison of the MATLAB com¬ 
putational time of the proposed solution, the baselines and 
the brute-force value iteration algorithm ld~2l in one time slot. 
Note that the computation time of Baseline 2 is the smallest 
in all different K scenarios but it has the worst performance. 
In addition, the computational time of our proposed scheme is 
close to those of Baselines 3 & 4 and the difference is due to 
the computation of the approximate priority function. There¬ 
fore, our proposed scheme achieves significant performance 
gain compared to all the baselines, with small computational 
complexity cost. 
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TABLE II 

Comparison of the MATLAB computational time 



K = 5 

K = 10 

lO 

T— 1 

II 

Baseline 2 

Baseline 3 & 4 
Proposed Scheme 
Brute-Force Value Iteration 

< 1ms 
0.007s 
0.046s 
> 10 5 s 

< 1ms 
0.015s 
0.091s 
> 10 5 s 

< lms 
0.029s 
0.143s 
> 10 5 s 


VI. Conclusion 

In this paper, we consider the dynamic power control for 
delay-aware D2D communications by formulating the asso¬ 
ciated stochastic optimization problem as an infinite horizon 
average cost MDP. To deal with the curse of dimensionality, 
a closed-form approximate priority function is derived using 
perturbation analysis. Both the analysis and the numerical 
results show that the approximation error is small and will 
vanish if the cross-channel path gain goes to 0. Based on 
the closed-form approximation, we propose a low complexity 
iterative power control algorithm and discuss some imple¬ 
mentation issues for practical systems. Finally, simulation 
results show that the proposed power control algorithm has 
significant performance gains in delay performance compared 
with various state-of-the-art baselines. 


Appendix A: Proof of Theorem[1] 

Following Prop. 4.6.1 of lfT 2 l . the sufficient conditions for 
the optimality of Problem 1 are that assume (Q)}) 

solves the following Bellman equation: 


0*T + v* (x) 

c(Q,fi(x))r + E Pr [x'|x,^(x)]L* (xO 


= mm 
n (x) L 


= mm 
fi (x) 


:(Q,^(x))t + EEE Pr [Q'|x,^(x)] 

Q' H' <x' 

Pr [H , l Pr [er'l V* (x') ] (21) 


and V* satisfies the condition in (fTOb for all admissible policies 
f2. Then 6 * = minL($2). Taking expectation w.r.t. H and cr 

on both sizes of (fHl) and denoting V * (Q) = E V* (%) | Q], 
we obtain the equivalent Bellman equation in (ITOl) in Theorem 

Cl 


Appendix B: Proof of Theorem [2] 

In the proof, we shall first establish the relationship between 
the equivalent Bellman equation in © in Theorem [2] and the 
approximate Bellman equation in (l22l) in the following Lemma 
[2l Then, we establish the relationship between the approximate 
Bellman equation in (l 22 l) in the Lemma El and the PDE in (fTTh 
in Theorem El 

1. Relationship between the Equivalent Bellman and the 
Approximate Bellman Equation: We establish the following 
lemma on the approximate Bellman equation to simplify the 
equivalent Bellman equation in ©: 

Lemma 2 (Approximate Bellman Equation): For any given 
weights (3 and 7 , if 

• there is a unique (0*, {V* (Q)}) that satisfies the Bellman 
equation and transversality condition in Theorem [l] 


• there exist 0 and V (Q) of clas0 C 2 (R+) that solve the 
following approximate Bellman equation : 


0 = E 


mm 

°(x) 


’(Q» n (x)) 


VQeS 


dV (Q) 

h w* 


A/c - cr k Ci c (H,f2(x)) 


( 22 ) 

Q 


and for all admissible control policy the transversality 
condition in (ITOl) is satisfied for V, 
then, we have 


0* = 0 + o(l), V* (Q) = V (Q) + o(l), VQgQ (23) 


where the error term o(l) asymptotically goes to zero for 
sufficiently small slot duration r. ■ 

Proof of Lemma\2\ Let Q' = (Q[, • • • , Q k ) = Q (t + 1) 
and Q = (Qi,--- ,Q k ) = Q(t). For the queue dynamics 
in © and sufficiently small r, we have Q' k = Q k — 
a k C k (H,P) + A k r, (Vfc). Therefore, if V (Q) is of class 
C 2 (M^), we have the following Taylor expansion on V (Q'): 


E[V(Q')|Q] 


K 


=L(Q) + E 


k=l 


dV(Q) 

dQk 


\ k -E[cr k C k (HMx)) 


Q 


(24) 
r + o(r) 


For notation convenience, let F x (9, V , fi(x)) denote the 
Bellman operator. 


F x (p,v,n(x)) 


K 


E 


9V (Q) 

dQk 


A* 


a k C k ( H,n(x)) 


-0 + c(Q,n(x)) + i/G x (V,n(x)) 

(25) 


for some smooth function G x and u = o(l) (w.r.t. 
r). Denote F X (9,V) = m i n n(Q) F x (9, V, fi(x))- Suppose 
(9*,V*) satisfies the Bellman equation in l©, we have 
E [F x (9*,V*) IQ] =0, VQ e Q. Similarly, if (9,V) 
satisfies the approximate Bellman equation in d22l> . we have 

E [F^ (9, V) |Q] = 0, VQ e Q (26) 

where F^(9,V) = min n(Q ) F^(9, V, fl(x)) and 

Fl(9,VMx)) = F x (0,V,n(x)) - pG x (V,SI(x))- 

We then establish the following lemma. 

Lemma 3: If (0, V ) satisfies the approximate Bellman equa¬ 
tion in (l22l) . then |E \F x (Q, V)|Q] | = o(l) for any Q e Q. ■ 
Proof of Lemma \3\ For any %, we have 

F X (9,V) = min n(x) [F,t(0, V, fl(x)) + pG x (V,SI(x))] > 
min n(x) F^(9, V, fi(x)) + ^ min n(x) G X (V, O(x))- Besieds, 

F X (9,V) < mm„ (x) Ft(0,V,n(x)) + pG x (V,n*(x», 

where fT = arg miiij i(x j P£(0, V, £7(x))- Since 
E[min n ( x ) F^.(9, V, f2(%))|Q] = 0 according to d26l ). 

and F 4 and G x are all smooth and bounded functions, we 
have |E[F X (<9, V)|Q] | = o(l) (w.r.t. r). ■ 

We establish the following lemma to prove Lemma El 


10 /(x) (x is a iV-dimensional vector) is of class C 2 (M^), if the first and 
second order partial derivatives of /(x) w.r.t. each element of x are continuous 
when x E 
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Lemma 4: Suppose E[F X (#*, ) | Q] = 0 for all Q to¬ 

gether with the transversality condition in (ITOl) has a unique 
solution (#*, V*). If (0, V) satisfies the approximate Bellman 
equation in (l22l) and the transversality condition in (flOl ) , then 
0 = 0* + o (1), y (Q) = y * (Q) + o (1) for all Q, where o(l) 
asymptotically goes to zero as r goes to zero. ■ 

Proof of Lemma \4\ Suppose for some Q', v (QO = 
V* (QO + O (1) (w.r.t. r). From Lemma [2 we have 
|E[F x (0,P)|Q]| = o(l) (w.r.t. t). Letting r -A 0, we 
have E[_F x (6,V)\Q\ = 0 for all Q and the transversal¬ 
ity condition in (ITOl ) . However, y (Q') y* (Q') due to 
y(Q') = y* (QO + 0(1). This contradicts the condition 
that (0*,y*) is a unique solution of F x (0* ,V*) = 0 for all 
Q and the transversality condition in (ITOl ) . Hence, we must 
have y (Q) = y* (Q) + o(l) for all Q. Similarly, we can 
establish 0 = 0* + o(l). ■ 


2. Relationship between the Approximate Bellman Equation 
and the PDE: For notation convenience, we write J (Q) in 
place of J (Q; L 6 ). It can be observed that if (c°°, {J (Q)}) 
satisfies CD, it also satisfies (122) . Furthermore, since J (Q) = 
0 (XAi Ql)’ then li m t^ooE n [J (Q(f))] < oo for any 
admissible policy El. Hence, J ( Q) = 0(J2k=i Qk) satisfies 
the transversality condition in G3- Next, we show that the 
optimal policy ft J * obtained from CD is an admissible control 
policy according to Definition [2 

Define a Lyapunov function as L( Q) = J (Q). 
We define the conditional queue drift as A(Q) = 
E° J * [Ylk=i ( Qk(t + 1) - Qk(t)) |Q(f) = Q] and condi¬ 
tional Lyapunov drift as AL(Q) = E° J [L(Q(t + 1)) — 
L(Q(£))|Q(£) = Q]. We first have the following relationship 
between A(Q) and AL(Q): 


AL(Q) > E 1 




K 




jk =i 


dQk 


(a) 

> A(Q) 


Q (t) = Q 


(27) 


if at least one of {Qk : Vfc} is sufficiently large, where (a) 
is due to the condition that j ° J qq'^ : Vfc j are increasing 
functions of all Qk . 

Since (Ai,..., A k) is strictly interior to the stability region 
A, there exists A = (Ai + ..., A k + kk) £ A for some 
positive k = {K>k • Vfc} l24l . From Corollary 1 of B51 there 
exists a stationary randomized QSI-independent policy 12 such 
that 

k 

^E n [ 7fc P fc |Q(f) = Q] =P(k) 

k=1 

[a k C k (U,P)\Q{t) = Q] > X k + K k , Vfc (28) 

where P(n) is the minimum average power for the system 
stability when the arrival rate is A. The Lyapunov drift AL(Q) 
is given by 


AL(Q) + E nJ * 


K 


E 7 fc-Pfci 


Lfc=i 


QCO = Q 


K 




fe=i 


<9Q/c 


■ E 




E ( ^P k r - ^^-a k C k (H, P)r 


,k=1 




QCO = Q 


<^A t r 


fc=l 

+ E^ 

K 


dQk 


K 


E ( dk p k T - a k C k (H , P)r 


,k=1 




(<0 ^<9L(Q) , 

< - E “aTv KfeT + - P ( #t ) T 


fc=i 




Q(f) = Q 
(29) 


if at least one of {Q k : Vfc} is sufficiently large, where 

( b ) is due to W* achieves the minimum of CD and 

(c) is due to (l28k Combining d29l) with <f27t . we have 

A(Q) < AL(Q) < - J2k=i 4to!r KT + P ( K ) T < 0 if 
at least one of {Q k : Vfc} is sufficiently large. Therefore, 
E[A fc -G fe (H,f2 J * ( X ))|Q] < 0 when Q k > Q k for some 

large Q k . Let <f> k {r, Q) = In (E[e(^“ G ' s(H ’ nJ * (x)) ) r |Q]) 
be the semi-invariant moment generating function of A k — 
G k (H,n J *( X )). Then, 4>k(r, Q) will have a unique positive 
root r%{ Q) (4>k(fl( Q), Q) = 0) J^|. Let = r k ( Q), where 
Q = (Q i? • • • 5 Qk )• Using the Kingman bound ll36l result 
that Fk(x) = Pr [Qk > x] < e~ r ^ x , if x > Xk for sufficiently 
large x k , we have 


E n * [J (Q)] 


K r 


<c 


fOO 

E En ' , ‘ [Ql] = C H / Pr [Ql > 3 ] ds 

k=1 k=1 

^ r°° 

<CY] / P fc (s 1 / 3 )ds+ / Pfe(s 1/3 )d, 

ib=i L ' 70 ^ 

x r r 

<cE /_ 

J X 


k=l 


xl + I e TkS ' ds 


< oo 


(30) 


for some constant C. Therefore, 12 J * is an admissible control 
policy and we have V (Q) = J (Q) and 0 = c°°. 

Combining Corollary [2 we have V * (Q) = J (Q) + o(l) 
and 0* = c°° + o(l) for sufficiently small r. 


Appendix C: Proof of Lemma [2 

We first prove that J (Q; 0) = Y2k =l (Qk)- ^he PDE in 

CEB for the base system is 


E 


mm 

°(x) 


r J2(^+7k p k 


+ 


k =l 

dJ_ (Q; 0) 

dQk 


(At — H,P 


Q 


(31) 


— c°° = 0 


We have the following lemma to prove the decomposable 
structures of J (Q; 0) and c°° in (1311) . 

Lemma 5 (Decomposed Optimality Equation): Suppose 
there exist c£° and Jk (Qk) £ C 2 (M + ) that solve the 
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following per-flow optimality equation (PFOE): 


E 


mm 

P k > o 


pk^+ikPk 

A/c 


J' k {Qk)(\k-OkC° k {H kk ,P k )) 


Qk 


Appendix D: Proof of Corollary [T] 

First, we obtain the highest order term of Jk{Qk )• The 
(32) series expansions of E\{x) and e x are given by 


= 0 


oo 

Ei{x ) = - 7 eu - 111.x - 

n— 1 


(-xT 

n\n 


oo 


E 

n=0 


n! 


(37) 


where C%(H kk ,P k ) = log 2 (l + Then, J(Q;0) = 

FiE (Qfe) and c °° = T,k =i c fc° satisfy HQ- ■ 

Lemma [5] can be proved using the fact that the dynamics of 
the K queues at the transmitters are decoupled when L 6 = 0. 
The details are omitted for conciseness. 

Next, we solve the PFOE in (l32l) . The optimal transmit 
power from (l32t is given by 


n 


J' k {Qk)<*k tn 0 \* 

Ik In 2 Hkk ) 


(33) 


Substituting the optimal transmit power P£ to (l32l) . and using 
the fact that cr follows a Bernoulli distribution with mean 
| jV - fe 1 | +1 (from Assumption 0 and Hkk follows a negative 
exponential distribution with mean Lkk (from Assumption 0, 
we calculate the expectations in (l32l) as follows: 


Using (1771) . (IT4b induces that Qk{y) = 0{ylny) and Jk{y) = 
0{y 2 lny) as y -A oo. In other words, we have Siylny < 
Qk{y) < S[yIny when y oo for some constants 8\ and 
and S 2 y 2 In y < Jk{y) < l n 2 / when y -a 00 for some 
constants £2 and .Therefore, 


£2 


Qk/S[ \ 2 ( Qk/K 


W(Q k /S[)J \W(Q k /5 ' 1 ) 


<^2 


/ Qk/Si 


In 




< Tfe(y) 

Qk/fii 


(38) 


where W is the Lambert function f38ll . Since W(x) = C7(ln x) 
for sufficiently large x El, we conclude that Jk{Qk ) = 

° (hTor) as Qk ->■ OO. 

Next, we obtain the coefficient of the highest order term 
Using (ITTlh the PFOE equation in (l36b implies 


E[7fcP fc *|Q fc ] 

1 r ( Wk) 

(|A4| +1 )L kk Je V ln2 

J k\ Q k) 


1 

|A4| +1 



A/Q r 7 fc ln 2 

J ki ( Qk) L kk 


X ) 


jkN ,pT / NpPjk ln 2 \ 

L k k \J'k (QQ Lkk J 


(34) 


Using the same integration region, we have 


E [a k log 2 (1 + P^H kk /(TN 0 ))\Q k ] 
1 E ( In 2 \ 

(l-Vfel + 1 ) ln 2 1 \J' k (Qk)L kk J 


(35) 


where E±(z) = f^° !L j—dt is the exponential integral func- 


tion. We then calculate 
Qk = 0 , we have 


-k • 


Since (|32 

= E 


E [a k log 2 (l + P^H kk /(TN 0 ))\Q k = 0 


should hold when 

7 kP* k 

= A k 


Q k = 0 ] and 
Substituting 


these into (l34k we can calculate c£° as shown in Lemma [I] 
Substituting (l34k (l35k and c£° into (l32l) and letting ak = 
Aro 7 fc ln 2 , we j iave t j ie following ODE: 


Q/c . 

A/c |A 4 | + 1 



ATq r7^. In 2 

J 'ki ( Qk) L kk 


7 /c Apr ^ / A 0 r 7 /c In 2 A 

Lkk \J'k(Qk) Lkk J 


C T + L'k {Qk) A/c 


- 4 (Qfe) 


i 

(|A41 + 1) In 2 


£1 


/ A 0 r7/c ln2 A 

\J' k {Qk) Lkk) 


= 0 


(36) 


According to Section 0.1.7.3 of 03 . we can obtain the 
parametric solution of (l36t as shown in (ITU) in Lemma |TJ 


J' k (Q k )\n(j' k (Q k )) = 


/?fc(|A 4 | + 1 ) ln 2 

Afc 


Q k + o(Q k ) (39) 


Since ,4 (Q k ) = O ( i u (Q,.) ) ’ dlere ex i st constants (5 and S' 


such that 

Ql 


=7A 


In (Qk) 

Qk 


< Jk (Qk) < S' 


Ql 


< J' k (QQ < A' 


In (Qk) 
Qk 


ln (Qk) ~ ~ In (Qk) 

=> In (A) + ln (Q k ) ~ lnln(Q fc ) < In (J' k (Qk)) 

< In (A') + ln (Q k ) ~ lnln(Q fc ) 

=>AQ k + o (Q k ) < J' k (Qk) In ( J'k (Qk)) < A 'Q k + o (Q k ) 

(40) 

where A and A' are some constants that are independent 
of the system parameters. Comparing it with <l39h . we have 
A, A' oc fa(l^+i)in 2 ^ § §l K ^(l^l+ 1 ) ln2 > where 
x oc y means that x is proportional to y. Finally, we 


conclude that Jk{Qk) = 


/3fc(|A/fcl+l) Qk 


2X k log 2 {Qk) 


and J' k (Q k ) = + 0 ( 5 ^^). 


of 91 ) 

\ i°g 2 (Qfc )) 


Appendix E: Proof of Theorem [3] 

We first write Hkj = LkjEkj , where Hkj is the short-term 
fading path gain. Taking the first order Taylor expansion of 
the L.H.S. of the PFOE in O at L kj = 0 (Vfc + j, d kj > 6), 
Pk = Pk (where P£ minimize the L.H.S. of pit), and using 
parametric optimization analysis ll39lL we have the following 
result regarding the approximation error: 

K 

J (Q; L s ) — J (Q; 0) = 7^ L ij J ij (Q) + 0((L s ) 2 ) 

i= 1 3^i, 

j^Afi(S) 

( 41 ) 
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where we have L 5 = 0(j?) according to Assumption |2| a k = h 72 l h en the second order derivative of 

Jjj(Q) captures the coupling terms in J (Q) satisfying: / (P C ,L 5 ) is calculated as: 


52 ( V-E 

+E 


ffc log 2 


^ ^ PkLkkHkk j 


riv 0 I 


Q 


dJjj (Q) 
dQk 


In 2 TiV 0 2 + NoPfLuHii 


Q 


= Oi 


(42) 


with boundary condition Jij (Q) \ q ._ 0 = 0 or 

^■(Q)! = 0 , and Okj = * s constant (where 

we treat 0 as a function of {Lij : Vi j}). According to (l34l) 
and (1351) . we have 


E 


a k log 2 1 + 


PkLkkHkk \ 
No I 


Q 


l 


(|A4W| + l)ln2 


O (In Q k ) 


E 


J'i(Qi) aiP* LuHaHij 


Q 


In 2 FNq + NoP* La Ha 

(|MWI + l)(|AGWI + l)(ln2)2 7 ,.JV 0 

___ q f _ QiQj 

(In 2) 2 \i\j^jNo V 1°§2 Qi lo g 2 Qj J 

Substituting these calculation results into (l42k using 3.8.4.7 
of ®] and taking into account the boundary conditions, 
we obtain that f (Q) = D n O wh ere 

Dij = 2 (fn 2 )vv 7 + ivi • Substituting it to (1411) . we obtain the 
approximation error in Theorem [3] 


Appendix F: Proof of Corollary[2] 

According to the definition of the problem in (fl 8 l ) is 
equivalent to 

dV (Q) 


mm 

p 


52 ( 7 k<J k Pk - -Q^- c * ( H > i^Pk ■ k £ .4(5)})) 


keA(S) 


(43) 


where *4(£) = {k : a k = 1 } is the set of active 

transmitters for a given S, C \ c (H, {a k Pk • k G *4(£)}) = 
logo fl + 2 . H kk cr k p k \ p) eno t e objective 

function in (l43t as / (P,ZE). We have the following lemma 
on the convexity for / (P, L s ). 

Lemma 6 (Convexity of f (P, i/) for Sufficiently Small L 6 ): 
f (P> L s ) is a convex function of P = {Pk : k G *4(£)} 
when L s is sufficiently small. ■ 

Proof: We adopt the following argument to prove the 
convexity ED: given two feasible points xi and X 2 , define 
g(t) = /(tx 1 + (1 — t)x 2 ), 0 < t < 1, then /(x) is a convex 
function of x if and only if g(t) is a convex function of t , 

which is equivalent to d d ^ > 0 for 0 < t < 1 . 

Consider the convex combination of two feasible solutions 
p(i) = {pW : k £ 4(5)} and p( 2 ) = {pf> : k £ 4(5)} 

as follows: P c = {P k = tPjj^ + (1 — f)P® : k £ 4(5)} 

and 0 < t < 1. We write Hkj = LkjHkj , where Hkj is 
the short-term fading path gain. Denote P_/c = {Py : Vj ^ 

kij C v4(^)}, -Rfe(P-fe) = W 0 + Y2j^k,jeA(5) LkjHkjPj and 


d 2 /(P c ,L 5 ) 


df 2 


= E 

k£A(5) 


dk [ ( Rk(P-k) + r^LkkHkkdkPk 


+ f LkkLkkHkkak { pi 7 - p i 2) ) 


-■Rfe 2 (P-/c) 


/(LR4P4 

V df 


(44) 


wher e = E^fcje^) L kj H kj (Pj - Pf) does 

not depend on t. 

As Z/ becomes sufficiently small, is proportional 

to L s and + j L kkHkk°k(Pk ) ~ P^) is dominate 

by jL kk H kk a k (Pl l) -Pl 2) ). R k 2 (P c _ k ) is P r °- 
portional to ( L 6 ) 2 , and hence it has little impact and can be 
ignored. Therefore, we have 

d 2 /(P c ,E> v 

d<2 ~ khw 

( d ^4P c _ fc ) + 1 Lkk jj kkak (pW _ pmjj j >f) (45) 

for sufficiently small L 6 . Therefore, / (P,i/) is convex for 
sufficiently small L s . ■ 

For sufficiently large S, L 6 is sufficiently small, so the 
problem in (l43l ) is convex, and hence (IT 8 l) is convex according 
to Lemma [ 6 ] Furthermore, since the limiting point P(oo) of 
algorithm |T] is a stationary point of the problem (fl 8 l) . it is also 
the unique global optimal point of ©. 


dk 


Pk(P — k ) T p LkkHkkdkPk 
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